Automatic Speech Recognition: A Deep Learning Approach
Automatic Speech Recognition (ASR) 은 사람의 음성과 기계간의 상호 작용을 꾀하기 위한 기술로, 다음과 같은 다양한 기술이 적용됩니다.
- Gaussian mixture models (GMMs)
- hidden Markov models (HMMs)
- mel-frequency cepstral coefficients (MFCCs) and their derivatives
- ngram language models (LMs)
- discriminative training, and various adaptation techniques
- GMM-HMM sequence discriminative training
이 책에서는 앞서 나열된 ASR 을 위한 기술들을 소개 및 설명하고 있습니다.
또한, 책에서는 ASR과 Deep Learning에 관련된 다양한 교재들도 소개하고 있습니다.
- Deep Learning: Methods and Applications, by Li Deng and Dong Yu (June 2014)
- Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods, by Joseph Keshet, Samy Bengio (January 2009)
- Speech Recognition Over Digital Channels: Robustness and Standards, by Antonio Peinado and Jose Segura (September 2006)
- Pattern Recognition in Speech and Language Processing, by Wu Chou and Biing-Hwang Juang (February 2003)
- Speech Processing—A Dynamic and Optimization-Oriented Approach, by Li Deng and Doug O’Shaughnessy (June 2003)
- Spoken Language Processing: A Guide to Theory, Algorithm and System Development, by Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon (April 2001)
- Digital Speech Processing: Synthesis, and Recognition, Second Edition, by Sadaoki Furui (June 2001)
- Speech Communications: Human and Machine, Second Edition, by Douglas O’Shaughnessy (June 2000)
- Speech and Language Processing—An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by Daniel Jurafsky and James Martin (April 2000)
- Speech and Audio Signal Processing, by Ben Gold and Nelson Morgan (April 2000)
- Statistical Methods for Speech Recognition, by Fred Jelinek (June 1997)
- Fundamentals of Speech Recognition, by Lawrence Rabiner and Biing-Hwang Juang (April 1993)
- Acoustical and Environmental Robustness in Automatic Speech Recognition, by Alex Acero (November 1992).
이 포스트의 시리즈에서는 이 책에서 다루는 다양한 내용들을 공부하고 정리한 내용들을 작성할 계획입니다. 책의 목차의 대제목은 다음과 같습니다.
- Introduction
Part 1 Conventional Acoustic Models
- Gaussian Mixture Models
- Hidden Markov Models and the Variants
Part 2 Deep Neural Networks
- Deep Neural Networks
- Advanced Model Initialization Techniques
Part 3 Deep Neural Network-Hidden Markov Model Hybrid Systems for Automatic Speech Recognition
- Deep Neural Network-Hidden Markov Model Hybrid Systems
- Training and Decoding Speedup
- Deep Neural Network Sequence-Discriminative Training
Part 4 Representation Learning in Deep Neural Networks
- Feature Representation Learning in Deep Neural Networks
- Fuse Deep Neural Network and Gaussian Mixture Model Systems
- Adaptation of Deep Neural Networks
Part 5 Advanced Deep Models
- Representation Sharing and Transfer in Deep Neural Networks
- Recurrent Neural Networks and Related Models
- Computational Network
- Summary and Future Directions
References
[1] Dong Yu, Li Deng, Automatic Speech Recognition: A Deep Learning Approach, 2015